A Formal Approach to Score Normalization for Metasearch

نویسندگان

  • R. Manmatha
  • H. Sever
چکیده

Meta-search, or the combination of the outputs of different search engines in response to a query, has been shown to improve performance. Since the scores produced by different search engines are not comparable, researchers have often decomposed the metasearch problem into a score normalization step followed by a combination step. Combination has been studied by many researchers. While appropriate normalization can affect performance, most of the normalization schemes suggested are ad hoc in nature. In this paper, we propose a formal approach to normalizing scores for meta-search by taking the distributions of the scores into account. Recently, it has been shown that for search engines the score distributions for a given query may be modeled using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Here, it is shown that by equalizing the distributions of scores of the top non-relevant documents the best meta-search performance reported in the literature is obtained. Since relevance information is not available apriori, we discuss two different ways of obtaining a good approximation to the distribution of scores of non-relevant documents. One is obtained by looking at the distribution of scores of all documents. The second is obtained by fitting a mixture model of an exponential and a Gaussian to the scores of all documents and using the resulting exponential distribution as an estimate of the non-relevant distribution. We show with experiments on TREC-3, TREC-4 and TREC-9 data that the best combination results are obtained by averaging the parameters obtained from these approximations. These techniques work on a variety of different search engines including vector space search engines like SMART and probabilistic search engines like INQUERY. The problem of normalization is important in many other areas including information filtering, topic detection and tracking, multilingual search and distributed retrieval. Thus, the techniques proposed here are likely to be applicable to many of these tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Normalization Techniques for Metasearch

It is well-known fact that the combination of the retrieval outputs of different search systems in response to a query, known as metasearch, improves performance on average, provided that these combined systems (1) have compatible outputs, (2) produce accurate probability of relevance estimates of documents, and (3) be independent of each other. The objective of a normalization technique is to ...

متن کامل

Analyse de la robustesse des algorithmes de méta-recherche discriminante

This paper studies the sensitivity of four metasearch engines under different situations. The focus of this analysis is on trainable metasearch engines. Our main contribution is a large scale systematic analysis of the performance and behavior of these methods on several corpora. Firstly, we analyze how the choice and normalization of the relevance score delivered by base search engines influen...

متن کامل

Search Result Merging and Ranking Strategies in Meta-Search Engines: A Survey

MetaSearch is utilizing multiple other search systems to perform simultaneous search. A MetaSearch Engine (MSE) is a search system that enables MetaSearch. To perform a MetaSearch, user query is sent to multiple search engines; once the search results returned, they are received by the MSE, then merged into a single ranked list and the ranked list is presented to the user. When a query is submi...

متن کامل

A Study of Metaindex Mechanisms for Selecting and Ranking Remote Search Engines

With the popularity of computer and network technologies, the number of web sites have rapidly increased, which has significantly stimulated the development of search engines. In recent years, a considerable number of search engines have been developed. The problem of how to find information on the Internet has been replaced with the problems of where to find the search engines, what they are d...

متن کامل

Normalization of Parents’ Response to Children’s Positive Emotions Scale

Abstract This study evaluated the normalization of the Persian version of the Parents’ Response to Children’s Positive Emotions Scale (PRCPS). For evaluating reliability and validity of this scale through random sampling, 400 mothers of 4-7-year-old children completed the PRCPS and Cognitive Emotion Regulation Questionnaire (CERQ). Evaluating internal reliability of PRCPS subscales by Cronba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005